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An  Error  Analysis  Technique  for 
Statistical  Hypothesis  Testing 

Generation  of  Jointly  Distributed  Rondom  Variates 


1.  INTRODUCTION 

There  are  two  main  themes  in  this  report.  The  first  is  an  assessment  of  the 
type  of  hypothesis  test  used  in  our  terrain  characterization  studies.  '  The  se¬ 
cond  is  the  construction  of  a  numerical  technique  to  allow  us  to  carry  out  that 
assessment.  Computer  generated,  jointly  distributed  sets  of  random  variates  are 
used  in  a  Monte  Carlo  formulation  that  verifies  the  decision  errors  to  be  expected 
for  various  hypothesis  tests.  There  is  a  need  for  this  type  of  approach  since  appro¬ 
priate  analytic  results  cannot  always  be  obtained. 

In  the  initial  characterization  study,  the  problem  of  interest  is  that  of  selecting 
appropriate  statistical  quantities  to  describe  a  large  terrain  region  that  is  consid¬ 
ered  to  be  made  up  of  smaller  subareas  (~4  km2).  The  main  topographic  feature 
is  the  distribution  of  heights  within  these  subregions.  This  places  some  constraints 
on  the  form  of  the  particular  statistical  analysis  carried  out  for  the  report.  The 
eventual  goal  of  these  studies  is  the  development  of  mathematical  descriptions  of 


(Received  for  publication  20  March  1981) 

1.  Lennon.  J.F.  and  Papa,  R.J.  (1980)  Statistical  Characterization  of  Rough 
Terrain.  RADC-TR -80-9,  RADC/EE  Hanscom  AFB,  MA. 

2  Papa.  R.J. ,  Lennon,  J.  F. ,  and  Taylor,  R.L.  (1980)  Eleetromagnetic_Waye 
Scattering  From  Rough  Terrain,  RADC-TR -80-300,  RADC/EE  Hanscom 
AFB.  MA. 
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each  of  the  subareas  for  use  in  calculation  of  the  scattering  of  electromagnetic 
’.caves  from  the  uneven  terrain  surface.  Each  region  is  characterized  by  a  geo¬ 
logic  code  and  several  statistical  parameters.  In  particular,  we  are  concerned 
with  being  able  to  associate  a  probability  density  function  (PDF)  with  the  range  of 
heights  in  the  subregions  and  to  determine  parameters  that  make  the  general  PDF 
explicit. 

When  we  start  from  some  observed  values,  the  most  general  situation  could 
be  that  neither  the  form  of  the  PDF,  nor  the  parameters  which  enter  into  the  ex¬ 
pression  for  the  PDF  are  known.  In  order  to  obtain  some  information  on  these 

3 

parameters,  estimation  theory  is  used.  Then,  the  parameters  are  incorporated 
into  a  PDF,  the  form  of  which  is  not  known  a  priori  and  must  be  determined.  An 
hypothesis  testing  procedure  is  developed  to  accomplish  this. 

The  particular  hypothesis  testing  procedure  selected  in  the  present  case  allows 
only  a  binary  decision  process  to  be  considered.  Hence,  the  discrimination  is  re¬ 
stricted  to  two  forms  of  the  PDF.  The  test  is  based  upon  the  maximum  a  posteriori 
probability  criterion. 3  4  This  is  equivalent  to  the  minimum  error  probability  cri¬ 
terion.  The  procedure  for  hypothesis  testing  may  be  applied  quite  generally.  In¬ 
deed,  where  the  number  of  cases  allow,  a  better  test  would  be  to  see  which  density 
would  be  more  likely  to  have  generated  several  realizations  rather  than  the  single 
one  used  here. 

For  whatever  type  of  PDFs  and  hypothesis  tests  employed  in  the  characteriza¬ 
tion,  there  is  still  the  concern  as  to  the  errors  involved  in  the  decision  process. 

In  using  the  results  in  various  electromagnetic  calculations  of  multipath  and  clutter, 
it  is  desirable  to  be  able  to  assess  the  reasonableness  of  the  terrain  feature  dis¬ 
tributions  which  have  been  decided  upon,  and  to  understand  the  implications  of  the 
decisions. 

In  the  form  of  hypothesis  test  used  in  the  present  studies,  the  decision  is  be¬ 
tween  simple  alternatives.  The  respective  probabilities  are  written  as  a  quotient 
and  the  decision  statistic,  a  function  of  the  random  variates,  Is  determined.  The 
decision  is  based  on  the  location  of  the  statistic  value  with  respect  to  the  two  re¬ 
gions  ,>f  the  decision  space  assigned  to  the  respective  hypotheses.  In  the  ease  of  sin; 
pie  alternatives,  the  two  regions,  although  not  necessarily  connected,  do  cover  tin- 
entire  decision  space.  The  concepts  of  the  decisions  and  hypothesis  testing  in  gen¬ 
eral  will  be  discussed  more  fully  in  Section  4. 

One  aspect  of  this  topic  that  should  be  kept  in  mind  is  that  this  particular  test 
is  concerned  with  minimizing  not  the  errors  of  each  alternative  hypothesis  but  only 

3.  Jenkins,  G.  M.  ,  and  Watts,  D.  G.  (1QK8)  Spectral  Analysis  and  Its  Applications, 

Holden-Day,  San  Francisco,  GA. 

4.  Whalen,  A.D.  (lf>7  1)  Detection  of  Signals  in  Noise.  Academic  Press,  New 

York,  NY. 


the  total  error  of  an  incorrect  decision  where  both  possibilities  are  taken  into  ac¬ 
count  simultaneously.  Thus,  based  on  the  forms  of  the  alternative  densities,  it 
may  occur  that  the  hypothesis  test  would  allow  decision  regions  that  produce  rela¬ 
tively  high  errors  in  one  alternative,  so  long  as  the  total  error  is  minimized.  If 
this  is  not  acceptable,  other  forms  of  the  test  will  have  to  be  devised. 

The  actual  assessment  of  the  errors  involved  in  the  decision  sometimes  can 
be  made  analytically.  One  such  case  occurs  when  the  function  of  the  random  vari¬ 
ates  that  represents  the  decision  statistic  is  sufficiently  simple  to  allow  calculation 
of  its  cumulative  probability  over  the  various  decision  regions.  In  general,  though, 
when  we  are  concerned  with  sequences  of  random  variates,  dissimilar  PDFs,  and 
probabilities  that  are  not  independent,  the  statistic  is  often  intractable  analytically. 
In  such  instances  an  alternative  approach  to  assessing  the  possible  decision  errors 
has  to  be  employed.  Since  we  are  concerned  with  a  statistical  process  in  the  sense 
of  determining  that  a  specific  set  of  observables  (Z^,  Z 2,  . .  .  ,  Zj^)  of  the  set  of 
jointly  distributed  variates  (Zj,  z 2,  ....  zN>  is  from  a  particular  distribution,  then 
one  possibility  would  be  to  approach  the  problem  from  a  Monte  Carlo  point  of  view.5 
To  implement  that  approach,  we  use  a  computer  model  to  generate  sets  of  jointly 
distributed  random  variates  from  selected  probability  densities.  Next,  the  corres¬ 
ponding  form  of  the  hypothesis  test  is  applied  to  a  large  number  of  these  independent 
observations  of  the  variates  to  determine  the  probability  of  an  incorrect  decision. 
The  probability  is  modeled  on  the  basis  that  there  are  a  given  number  of  incorrect 
decisions  by  the  test  in  some  large  number  of  tries.  The  key  to  this  approach,  then, 
is  to  be  able  to  generate  the  desired  sets  of  variates  having  the  given  joint  probabil¬ 
ity  density  functions.  This  is  relatively  straightforward  when  the  variates  are  inde¬ 
pendent,  but  can  be  quite  complex  when  they  are  not. 

The  report  first  addresses  this  problem  of  producing  the  sets  of  variates.  Then, 
specific  cases  are  considered.  Some  aspects  of  the  hypothesis  testing  are  treated. 
The  results  of  the  Monte  Carlo  approach  in  evaluating  the  tests  are  then  compared 
with  some  analytic  results.  Finally,  the  specific  application  to  terrain  height  anal¬ 
ysis  is  developed  and  the  implications  of  the  errors  are  discussed. 


2.  GF.NKR ATIOIN  OK  NONl  NIKORM  RANDOM  \  ARIATKS 

A  standard  tool  in  statistical  theory  is  the  generation  of  psuedorandom  numbers 
based  on  the  uniform  probability  distribution.  Variates  representing  other  distri¬ 
butions  can  also  be  determined.  In  this  section  we  will  discuss  several  aspects  of 
the  general  theory.  Both  single  variate  and  mult' variate  forms  will  be  considered. 
Independent  and  dependent  distributions  will  be  examined  separately. 


a.  Shreider,  Yu.,  A.  (19fif>)  Hie  Monte  Carlo  Method,  Pergamon  Press,  Oxford. 
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Z 

n 

Un  =  J  p(z)  dz. 

-OC 


where 


0  <  U  SI  and  n  =  1,  2,  3,  .  .  . 
n 

A  detailed  discussion  of  this  relation  is  found  in  Shreider. 

The  procedure  used  to  generate  random  variates,  Z  having  the  probability 
density  p(z)  is  based  upon  the  above  relation.  First,  standard  computer  algorithms 
are  used  to  generate  a  set  of  uniformly  distributed  random  variates,  Un.  Next,  a 
probability  density  function  p(z)  is  selected.  Then,  in  the  integrals 

Z 

n 

W  =  /  p(z)dz 

-<c 


upper  limits,  Z  are  determined  such  that  F  =  U  . 

n  n  n 

If  f’(Zn>  can  be  formed  analytically  in  terms  of  known  functions,  then  Z^  can  be 
determined  directly  by  finding  the  inverse  relation  such  that  Zr  =  F_1(U  ).  Exam¬ 
ples  of  this  case  include  the  Gaussian  density. 


•1/2  r  2.2 

exp[  -z  /2a  ] 


and  the  l.aplacian  density, 

P,  (z)  1  (  T ) cxpl  -o,z  \  1. 

Alternatively,  if  F^fZ  )  has  to  be  evaluated  numerically  (using  some  quadrature 

formula,  such  as  Simpson's  rule)  then  the  value  of  Z  such  that  F  (Z  )  =  U  is 

n  n  n  n 

obtained  by  iteration. 
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So  far,  we  have  been  discussing  the  relation  between  sets  of  uniform  random 
numbers  individually  related  to  corresponding  sets  satisfying  a  different  probabil¬ 
ity  distribution.  Next,  we  address  the  question  of  the  relation  between  jointly  dis¬ 
tributed  sets  of  random  variates  and  equivalently  sized  groups  of  uniform  random 
numbers. 

2.2  Jointly  Distributed  Random  Variates 

In  order  to  consider  this  case,  we  look  at  the  relation  between  the  joint  density 
function  and  its  associated  conditional  densities,  '  '  ' 

P<21*  z2 . =  P(zN  ZN-1 . zl)  P(zN-i . z!> 

=  p(zN  *N_J.  ....  ZJ>  p(zK.1  zN_2.  ....  Z,)  ••• 
p(z,  Zj)  pfZj). 

We  use  these  relations  and  the  definitions  relating  to  cumulative  distribution 
functions  and  consider,  just  as  for  the  single  random  variate  case,  a  set  of  jointly- 
distributed  variates  (Z^ ,  ....  Z^)  that  occupy  a  volume  AV ^  =  Az^  Az0  .  .  .  Az^. 

in  a  joint  probability  space.  These  variates  are  related  to  a  corresponding  set  of 
uniformly  distributed  random  variates  ( U ,  U,,,  ....  U^.)  having  the  associated 
volume  AVu  =  Au^  ilu,  .  .  .  Au^.  The  distinction  here  is  that  we  now  work  with 
successive  single-variate,  conditional  densities  each  of  which  depends  on  the  pre¬ 
viously  determined  elements  of  the  set.  To  illustrate  the  specifics  of  the  procedure, 
we  will  first  outline  the  two  variate  ease  ’  and  then  proceed  to  the  general  multivar¬ 
iate  case. 

For  two  variates  we  have, 

plZj,  z,;)  =  p(z.,  z  >  p(Zp)  . 

Here  pfz^l  is  the  marginal  density  given  by 

00 

P<Zl>  =  J  p(Zj,  z2)  dz2  . 

-X 


6.  Papoulis,  A.  (1965)  Probability,  Random  Variables  and  Stochastic  Processes, 

McGraw-Hill,  New  York,  NY. 

7.  Mood,  A.M.,  and  Grayhill,  F.  A.  (1963)  Introduction  to  the  Theory  of  Statistics, 

McGraw-Hill,  New  York,  NY. 
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Next,  the  •••  7  ^  is  Si.'  stiUdc  >'  into  !iu  .:ot>-iiUe.”Sti  prubtiMSitt 

p(z9  z  )  and  the  integral 


F(Zj,  ?,) 


*2 

/ 


p  (z9  Z  )  dz9 


is  constructed.  Now  this  integral  is  the  same  form  as  the  integrals  considered  in 
Section  2.  1,  so  that  a  second  random  variate  Z9  may  be  generated,  which  belongs 
to  the  density  p(z.,,Zj).  This  is  accomplished  by  using  the  relation 

Z2 

U2  =  /  p (z2iZl)  dz2  ' 


The  two  random  variates,  Zj  and  Z2>  satisfying  respectively  the  univariate 
marginal  and  the  related  conditional  density,  jointly  have  the  density  p(Zj,  z9).  The 
procedure  can  be  repeated  as  often  as  desired  to  obtain  multiple  realizations, 

( Z  ,  Z2>  ,  of  the  joint  distribution. 

For  the  multivariate  case  we  use  the  univariate  marginal,  the  successive  con¬ 
ditional  densities,  and  the  corresponding  marginal  densities, 

CC  00 

P<Zj)  “  /  •••  /  p(Zl,  z2,  ....  zN)  dz2,  dz3  ...  dzN; 

-00  -00 

p(z2jZj)  =  p(z2,  Zj)/p(Z j)  ; 

p(Zg  |  Z  J,  Z2)  =  P^Zg,  Zg,  Zj)/p(Zj,  Zlg), 
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p<zNi  Zl’  Z2 '  •••*  ZN-l^  =  p(zN'  ZN-1'  2l)/P(Zl'  Z2'  •••’  ZN  -  1  ^  ' 

The  set  of  variates  having  the  joint  probability  density  p(zj,  z 2,  ....  z^)  is  then 
obtained  by  the  successive  integrations 


U2  -  /  P(z2'|Z1)dz2; 

-00 


U 


N 


; 


"N 


p(z 


NlZl*  Z2’ 


ZN-1J  dzN 


Again,  the  sequence  can  be  repeated  as  often  as  desired  for  multiple  realizations 
(Zr  Z2 ’  •  •  •  ’  ZN)n‘ 

2.3  Independent  Joint  Distributions 

The  random  variates  in  the  joint  density  function,  p(Zj,  z2>  .  •  •  ,  Zjj)  are  inde¬ 
pendent  if  and  only  if 

P^zj>  z2>  •  ■  ■  >  zj^  _  Pj^z^)  P2 ^z2 ^  '  *  ■  pN^zN^ 

for  all  values  for  which  the  random  variates  are  defined.  Similarly,  the  cumula¬ 
tive  distribution  of  N  independent  random  variates  is  given  by 

?1  ?2  ?N 

F(51*  ?2 . =  /  dzl  f  dz2  ‘  ’  ’  /  dzNpl(zl)p2(z2)  •••  pN(zN) 

-00  -  00  -  00 

=  WW-  fn«n) 
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where 


F  (?  )  =  f  dz  p  (z  )  . 

mm  J  m  m 

-00 

Thus,  when  we  have  the  case  of  independent  variates  where  the  probability  density- 
functions  are  the  same  for  all  the  variates,  the  multivariate  random  number  pro¬ 
cedure  is  simplified.  It  reduces  to  the  situation  where  all  that  is  required  is  suc¬ 
cessive  implementations  of  the  single  variate  determination.  We  then  have  a  set 
(Zj,  Z0,  ....  Z^)  of  independent  variates  each  satisfying  the  corresponding  pro¬ 
bability  density  and,  as  before,  additional  sets  can  be  generated. 

3.  EXAMPLES 

In  the  preceding  section  we  have  discussed  the  general  technique  for  producing 
sets  of  random  variates  that  satisfy  given  joint  probability  density  relations.  Both 
the  simplified  procedure  for  the  case  where  the  variations  are  independent  and  the 
more  complicated  form  when  they  are  not  independent  were  outlined.  In  the  various 
analyses  treated  in  this  report  several  different  density  functions  were  used  and 
these  examples  will  be  discussed  in  this  section. 

3.1  Independent  Variate* 

For  the  case  where  the  jointly  distributed  variates  are  independent,  the  only 
form  required  is  that  of  the  single  variate  marginal  density.  Two  types  are  used, 
Gaussian  and  Laplace.  For  all  the  generating  routines,  the  variates  are  taken  to 
satisfy  zero  mean,  unit  variance  distributions.  The  implications  of  this  assump¬ 
tion  will  be  discussed  in  Section  7. 

The  first  case  is  the  form  for  the  multivariate  Gaussian  density: 

PG*Z1’  z2’  ‘  '  ‘  ’  ZN*  =  (2ir)'N//2  exp[ -1/2  (Zj  +  +  •  .  .  +  z^,)l  . 

The  corresponding  form  for  the  multivariate  Laplace  density  is 

PL  <Zj,  z2,  .  .  .  ,  zN>  =  (v?rN  exp  [  -vf  (}  z  J  +  i  z.,  \  +  .  .  .  +  1  zN  !  )  )  . 
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These  two  forms  are  the  only  two  cases  of  independent  variates  that  were  generated 
in  the  course  of  the  study.  All  the  other  results  are  for  general  distributions  where 
the  variates  are  not  independent. 


3.2  Nonindependent  Variates 

In  this  category,  the  fact  that  the  joint  densities  cannot  be  written  as  products 
of  single  variate  independent  probabilities  means  that  the  successive  marginal  and 
conditional  densities  must  be  included  in  the  random  number  generation  (RNG)  pro¬ 
cess.  It  should  be  noted  that  in  these  cases  we  will  be  dealing  with  uncorrelated 
forms  of  the  variates  where  the  uncorrelated  variates  are  not  necessarily  indepen¬ 
dent.  Three  specific  examples  were  considered  in  various  phases  of  the  overall 
study.  For  each  of  these  cases  we  will  indicate  the  general  N-variate  form  of  the 
PDF  and  the  associated  general  marginal  density  relation. 

The  first  form  is  that  of  the  particular  version  of  the  multivariate  exponential 
density  used  in  our  initial  terrain  analysis:  * 


Pe(V 


2n>  = 


(N  +  1) 


N/2 


N-l 

2 


exp[  -vN+i 


Vh 


,  2  .  2  . 
(Zj  +  z2  + 


Its  general  L-variate  marginal  density  is. 


N+L+I 


N-L.+  1 

0  9  0  4 

<Zj  +  z“  +  .  .  .  +  Zj  )  X 


X 


for  l.£N 


where  F  (u)  is  the  Gamma  function  and  K  (X)  is  the  modified  Bessel  function  of  the 
second  kind.  This  form  and  those  of  the  following  two  cases  have  been  described 
previously.  ^ 

8.  Lennon,  J.  F.  (1980)  The  Derivation  of  a  Multivariate  Probability  Density 

Function  Having  an  Exponential -Type  Bivariate  Marginal  Density  RaiV. 
TR~80-  153,  HifbiVfcE,  rtanscom  APB.  MA. - - * 
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The  second  PDF  that  is  of  interest  is  the  form  of  Bessel  function  density  that 

8 

has  a  bivariate  exponential  marginal  density: 


The  final  example  of  nonindependent  variates,  though,  was  developed  specifically 
to  verify  some  elements  of  this  analysis  and  the  discussion  of  the  determination  of 
the  appropriate  normalization  factors  for  the  PDF  will  be  presented  in  Appendix  A. 

This  final  form  resembles  a  Gamma  type  distribution  and  will  be  referred  to 
in  those  terms  in  following  sections: 

pr  (zr - zN)  =  CjCjzJ  +  .  .  .  +  |zN|  >  exp[  -C2(|zj  j  +  .  .  .  +  |  zN|  )] 

where 

C'l  =  C'N+1/(N2N)  and  C'2  =  V,2(N+2)/N  . 

The  marginal  density  form  is 

/2n'lc 

pr  (Zj,  ...  ,  zL)  =  I  N-L+l 


[  (N-L)  +  Y]  exp(-Y)  for  LsN 
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where 


Y 


N 


)]  . 


3.3  Illustration  of  the  Procedure 

In  Section  2,  the  overall  approach  to  generating  random  numbers  having  speci¬ 
fied  probability  densities  was  described.  This  section  has  presented  the  particular 
types  of  PDF.  We  will  now  proceed  to  outline  the  technique  used  to  obtain  the  de¬ 
sired  sets  of  variates.  In  the  two  examples  we  will  use  N  =  5  as  the  number  of 
jointly  distributed  variates. 

For  both  independent  and  nonindependent  cases  the  starting  point  is  the  same. 

W e  use  standard  computer  techniques  to  generate  variates  satisfying  the  uniform 
distribution  in  the  range  0  to  1.  The  computer  algorithms  produce  a  sequence  of 
psuedorandom  numbers,  a.,  that  have  weak  statistical  correlation,  the  PDF  of  the 
psuedorandom  numbers  approximates  an  uniform  PDF,  and  the  program  is  stable, 
that  is,  the  PDF  remains  the  same  for  all  output  variates.  For  the  actual  program 
used,  the  algorithm  is, 

m+1  =  2310.  +  1;  =  466364003.0 


and 


/3.+1  =  16807  /3.  (MOD  231-l)  . 

The  resultant  sequences  are  then  transformed  into  the  desired  variates. 
For  the  independent  Gaussian  case  we  just  apply  the  relation 

Z. 

l 

ui  =  J  (2 jrf1/2  exp  (-z3/2)  dz.  i  =  1,2 . 5 

-00 


five  successive  times  to  form  the  set  of  desired  jointly  distributed  normal  variates, 
{z^}.  The  evaluation  is  based  on  standard  representations  of  the  inverse  error 
function  for  the  required  ranges  of  z„ 

If  we  use  our  form  of  exponential  PDF  as  the  example  for  the  nonindependent 
case,  the  sequence  of  five  numerical  integrations  required  to  determine  the  jointly 
distributed  set  of  variates  is. 
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u 


1 


j  (3v(f/lH)(l  +  V6|Z1|  +  2z2) 
-00 


U, 


J  (p(Zj))  *  [(  9/4ir)(Z  2  +  z.j )  |  K.,  (i  ^fz^""+""z“"  j  dZg  . 
-00 


U„ 


u. 


and 


Uc 


3  r  _ 

f  (p(Z1,Z2))'1  (3V?/  lfijr)  ( 1  +  v'fi" ^Z2  +  Z2  +  z2  j  I 
-00  -* 

X  exp  [-  V6  ^Z2  +  Z2  +  z2  j  dz3  , 


/  (pfZj.Zg.Zg))'1  (9 v(T/8jt2)  (^Z2  +  Z2  +  Z2  +  z2  ) 
00  L  -* 

(  ki  (^iR" 


+  Z2  +  Z2  +  z2  )  dz4  , 


=  J  (p(ZrZ2,Z3,Z4))'1  (9V6/16tt2)  X 


X  exp 


-v'6  f2 


+  Z2  +  Z2  +  Z24  +  z2  dz5 


The  sequence  {Zj,  Z^,  Zg,  Z4,  Zg}  obtained  from  the  successive  numerical  inte¬ 
grations  now  forms  the  set  of  jointly  distributed  random  variates.  It  should  be 
noted  that  not  all  cases  require  numerical  integration;  for  some  PDFs  the  integra¬ 
tion  may  be  analytic. 
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4.  HYPOTHESIS  TKSTING 


Q 

As  explained  in  Morrison,  statistical  inference  may  be  divided  into  two  gen¬ 
eral  categories.  The  first  category  is  concerned  with  the  estimation  of  distribution 
functions,  the  parameters  of  such  functions  when  their  mathematical  form  is  speci¬ 
fied,  or  the  parameters  of  models  related  to  random  variables.  The  second  cate¬ 
gory  is  devoted  to  the  problem  of  testing  the  validity  of  hypotheses  about  distribu¬ 
tion  functions  and  their  parameters.  This  section  is  concerned  with  statistical 
inference  in  the  second  sense.  A  further  restriction  is  that  only  simple  alternative 
hypothesis  testing  will  be  considered. 

As  was  discussed  in  the  report  by  Lennon  and  Papa,  1  the  concern  for  our  par¬ 
ticular  case  is  to  determine  whether  a  set  of  data  (for  example,  terrain  heights)  is 
better  described  by  a  Gaussian  or  by  another  probability  density  function  (PDF). 

Let  hypothesis  correspond  to  the  case  where  the  data  set  is  from  a  Gaussian 
distribution  and  hypothesis  Hq  correspond  to  the  case  where  the  data  set  is  from 
some  other  distribution.  Let  Y  denote  the  values  assumed  by  the  random  variable 
from  which  the  data  set  is  assumed  to  originate.  Then  let  P(Hoj  Y)  be  the  proba¬ 
bility  that  Hq  is  true  given  the  observation  Y  and  let  POl^  Y)  be  the  probability  that 
Hj  is  true  given  the  observation  Y.  The  discussion  of  hypothesis  testing  follows 
Whalen.  4  Let  pQ(y)  H  p(y|HQ)  represent  the  probability  describing  the  data,  given 
that  Hq  is  true  and  let  p  1  (y)  =  p(y  H^)  denote  the  probability  for  the  data  given  that 
H1  is  true.  The  sample  space  of  observations  y  may  be  divided  into  two  regions 

R  and  R,  such  that  if  a  sample  point  Y.  e  y  falls  into  R  then  H  is  chosen  (denoted 
o  1  r  i  o  o 

as  decision  Dq)  and  if  the  sample  point  falls  into  R  j  then  H  j  is  chosen  (denoted  as 
decision  Dj). 

For  both  hypotheses,  the  assignment  of  regions  and  the  associated  decision 
process  involve  two  possible  errors,  type  1  being  the  rejection  of  a  true  hypothesis 
and  type  II  being  accepting  a  false  one.  For  example,  defining  the  bounds  of  Rq 
neglects  both  P(Y  e  Rj!Hq)  and  P(Y  e  Rq  H  f ). 

In  terms  of  decisions  relating  to  the  hypothesis  test  evaluations,  we  postulate 
the  density  a  priori  and  generate  large  numbers  of  sample  observations  from  that 
given  density.  Thus  what  we  are  concerned  with  is  the  type  I  error  for  the  valida¬ 
tion  procedure.  For  instance,  if  we  have  Hq  as  true  then  the  error  is 

p<Di! Ho>  =  /  pG<y>  «y  • 


9.  Morrison,  D.  F.  (1976)  Multivariate  Statistical  Methods.  McGraw-Hill,  New 
York,  NY. 
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and  correspondingly  if  we  select  the  density  of  to  generate  the  collection  of  sam¬ 
ple  sets,  then  the  error  that  we  evaluate  with  this  technique  is 

P(Dq|hi)=  /  Pj(y)  dy  . 

R 

o 

It  should  be  noted  that  when  the  hypothesis  test  approach  is  being  applied  to  data 
from  unknown  densities  and  a  decision  made  as  to  the  most  appropriate  hypothesis, 
then  the  two  types  of  error  must  be  considered  for  each  observation,  Y . . 

4. 1  Background 

In  testing  hypotheses,  one  aspect  is  assessing  relative  costs  of  correct  and  in¬ 
correct  decisions.  We  define  C_  as  the  cost  associated  with  choosing  hypothesis 
H.  when  hypothesis  H_.  is  true.  In  this  context,  the  approach  (known  as  the  Bayes 
strategy)  is  to  formulate  relations  for  the  average  cost  or  risk  for  a  decision  and 
then  to  minimize  this  average  cost.  The  average  cost  for  the  decision  procedure 
is 


C  =  P(Ho)[P(Do|Ho)Coo  +  P(D1|Ho)Clo] 

+  P(H1)lP(Do|H1)Col  +  P(D1|H1)Cnl  . 


By  using  the  relations 

P(Hj)  = 

1  -  P(Hq) 

p(D1lH1)  = 

1  -  p<DoiHl> 

P<Dl|H0)  = 

l-P^olV 

P<DolHo>  = 

/  P0<y>  dy 

R 

o 

p<Do|Hl)  - 

/  p/y)  dy 

R 

0 
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the  average  cost  may  be  expressed  as 


C  =  P(Ho)Clo  +  tl  -P(Ho)]Cll  +  /  «l-p<H0>l<c0i  -CU)Pl(y) 

R 

o 

-  <Clo-  dy  • 

This  expression  for  C  may  be  minimized  by  including  in  the  region  Rq  only  that 
portion  of  the  y  domain  for  which  the  integrand  is  negative.  The  region  Rq  where 
Hq  is  chosen  is  the  region  where 

P(Ho)(Clo  -  Coo>  po<y)  S  El'P(Ho>l(Col  -  C11)Pl(y) 


If  the  likelihood  ratio  parameter  X  is  defined  as 


X 


§ 


Pj(y) 

P^yT 


then  the  decision  rule  is  to  choose  H  ^  if 


X  £ 


P(H  )(C. 
o  lo 


-  c  ) 

oo 


[l-p(Ho)|  (Col-Cn) 


If  no  cost  is  associated  with  a  correct  decision,  and  the  errors  of  each  kind  are 
assigned  equal  cost,  then 


oo 


=  C 


11 


=  0 


and 


ol 


lo 


=  1 


Then,  the  decision  rule  is  to  choose  Hj  if 


X  P 


\  p0(y)  / 


(  P<Ho>  \ 

\1  -  p(H0)  /  • 


This  test  is  identical  to  the  maximum  a  posteriori  probability  criterion. 
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The  above  result  is  readily  generalized  to  the  case  where  the  probability  den¬ 
sities  p,  and  p  are  functions  of  N-variates 
r  1  ro 

Pj  =  Pi<zi . ZN)  and  po  =  po(zl . ZN)  ' 


so  that  the  decision  rule  is  choose  H }  if 


X  $ 


pl(zl . ZN> 

Po(zr  . . . .  zN) 


pWo> 
l-P(Ho)  • 


For  our  case,  we  assume  that  it  is  equally  likely  that  hypothesis  Hj  or  Hq  is  true 
and  the  decision  then  is  choose  H  ^  if 

X  2  1  . 


This  formulation  represents  the  decision  as  a  quotient  of  two  multivariate  PDF’s. 
Thus,  for  the  analyses  discussed  in  this  report,  the  requirement  is  to  identify  the 
form  the  hypothesis  test  assumes  for  each  pair  of  multivariate  densities  that  are 
being  compared. 


4.2  Specific  Kxamples 

In  Section  3,  the  different  multivariate  probability  densities  which  are  used  in 
the  evaluation  of  the  hypothesis  testing  procedure  have  been  specified.  The  vari¬ 
ates  are  all  zero  mean,  unit  variance,  and  uncorrelated.  In  the  following  sections 
the  specific  forms  that  the  hypothesis  test  assumes  for  deciding  between  two  alter¬ 
native  densities  are  given.  Note  that  in  all  cases  one  of  the  two  alternatives  is  the 
multivariate  Gaussian  density. 

4.2.1  GAUSSIAN  AND  EXPONENTIAL  HYPOTHESES 

For  this  pair  of  densities,  the  likelihood  ratio  parameter  is  given  by 


X  = 


PE 


N+l  N-l 

2  2  (20  2  r  (^p) 

(2tt)N/2  (N+1)N/2 


=  r(N±i)  e(N+l)/2 


exp  [  -(Q- v'N+T)2/2  1 
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where 


«2*Z?*Z2*---  *Z2n  ■ 

It  is  possible  to  rewrite  the  test  in  logarithmic  form  and  assert  that  H  is  true 
(the  PDF  is  Gaussian)  if  In  X  £  0.  Then  the  result  is:  H  is  true  if 

-(1/2)  (Q  -  VN+1  )2  £  (1/2)  In  t  -  (1/2HN+1) 


+  (f  )  In  [  (N+l)/2  ] 


In 


or  if 


-  >/2B )  £  Q  <  (•n/n+T  +  i/2B ) 


where 


B  =  In 


4 .2.2  GAUSSIAN  AND  BESSEL  HYPOTHESES 
For  this  pair  the  likelihood  ratio  has  the  form 


X  =  Pfi/Pk-  = 


N-l  N+l 
2  2  „  2 


G/fK  *  \3N/2  2N/2  7rN/2/ 


N-3 


(VsV?)  2  exp  {-Q2/2 } 
KN-3 


3N/2  y/2 


N-3 

(VWg)  2  exp  {-Q2/2l 

Kn_3  (^Vq2) 


Then  the  test  is  choose  Gaussian  if 


0  £  lnU/2)  -  (~)  In  3  +  (  ^)  In  Q2  -  Q2  -  2  In  p<N_3  (-^/3-n/q7  )  J 


4.  2.  3  GAUSSIAN  AND  LAPLACE  HYPOTHESES 
For  this  case,  the  likelihood  parameter  becomes 


PG'PL 


,-N/2 


exp  [  -(Z^  + 


+  Z^)/2) 


exp  (  -v/?  ([Zj  +  .  .  .  +  |ZN|  )) 


or 


A 


(  * 


-N/2 


N. 

e  )  exp 


N 

-0/2)  £  (\Zi  -  V2  )2 
i=  1 


Then,  in  terms  of  the  log-likelihood  ratio  test  we  choose  H^  (Gaussian)  if 

N 

2  [  N  -  (N/2)  In  ?r  1  ?  £  (  z.  -Jl)1  . 

i=  1 


4.2.4  GAUSSIAN  AND  GAMMA  HYPOTHESES 

For  this  pair  of  probability  densities,  the  likelihood  ratio  parameter  becomes 

N+3 
N  2 

s/1  rN/'2  (N+2) 


exp 


+ 


4>/ 2] 


(Iz!i  + 


+  iZN 


)  exp  [-  v'T7n+2)7n  ( j  Z  |  + 


+  ZN ■», 


or 


A  = 


N+3  N+2  \ 

i  2  e  N  ) 


s/1 


r  N/2 


(N+2) 


N+ 1  \ 
2 


-1 


<  Z, 


>-1 


1M 


X  exp 


-0/2) 


N 

I 

i=  1 


<|ZS 


-  v/2(N+2)/N )2 
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Hu-, 


for  llu-  log  likelihood  ratio  test,  the  hypotheses  lest  becomes:  choost  II 


if 


\~)  In  N  -  (1/2)  In  2  -  (  ^±i)  In  (N+2)  -  (N/2)  In  -  +  } 


where 


D 


N 

(1/2)  £  ( z.  -  Ji  (n+2)/n  r  + 


In 


z. 


i=  1 


i  =  1 


421  K.rror  Probabilities 

In  Section  1.2  and  Section  4,  we  discussed  error  probabilities  in  relation  to 
the  decisions  of  the  various  hypothesis  tests  of  interest.  The  concept  is  based  on 
the  identification  of  some  statistic  associated  with  both  distributions  which  deter¬ 
mines  the  two  decision  regions  of  the  hypothesis  test  for  the  particular  case.  The 
errors  involve  probabilities  of  incorrect  action  when  a  given  hypothesis  is  true 
(see  Mood  and  Graybill).  For  example,  the  probability  of  a  type  1  error  for  de¬ 
cision  Dq  would  be  represented  by  the  probability  that  the  statistic  value  would  fall 
in  Region  R  j  given  that  Hq  is  true  and  the  corresponding  type  II  error  probability 
would  be  the  probability  that  the  value  of  the  statistic  would  fall  in  in  region  Rq 
given  that  is  true.  The  expressions  for  these  conditional  probabilities  have 
been  outlined  in  Section  4.  1.  They  reduce  to  integrations  of  the  particular  statis¬ 
tic  PDF  over  the  decision  region  of  the  opposite  distribution. 

As  has  been  pointed  out,  it  is  not  always  clear  as  to  what  form  the  PDF  of  the 
statistic  will  have,  particularly  when  the  densities  in  the  hypothesis  test  are  com¬ 
plicated  and  dissimilar  in  form.  For  two  cases,  though,  we  have  established  the 
desired  PDFs.  Comparison  of  the  analytic  results  for  those  cases  with  the  Monte 
Carlo  type  results  will  help  to  assess  the  validity  of  the  latter  approach  for  cases 
where  the  analytic  results  are  not  available. 

N  2 

If  we  set  the  statistic,  q  =  E  Zj  and  postulate  that  the  multivariate  distribu- 

i=  1 

tion  is  Gaussian,  then  the  density  of  the  statistic  is  given  by 


Pq(q> 


[  2N/2  r(N/2)]  q  2  e"q/2 


q  2:  0 
q  <  0 
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For  the  case  where  the  multivariate  distribution  is  exponential,  the  density  is 


PqW  = 


N  >1 


[  2  r  (n  )  ] 


N-2 

2  -•s/N+l  Vq~ 

l  P  T 


q  2  0 
q  <  0 


These  relations  are  discussed  in  greater  detail  in  Appendix  B 


r>.  RK.Sl  I  TS 

In  this  section  the  topics  discussed  in  the  preceeding  sections  have  been  com¬ 
bined.  Computer  programs  were  written  for  generating  the  different  sets  of  ran¬ 
dom  variates  and  using  them  in  the  associated  simple  alternative  hypothesis  tests. 
These  results,  the  analysis  of  some  of  the  related  questions  regarding  the  numeri¬ 
cal  aspects,  and  the  analytic  error  results  will  all  be  presented. 

5. 1  Monte  Carlo  Applications 

Five-variate  jointly  distributed  sets  were  generated  for  the  various  forms. 

In  each  case  the  alternative  decision  was  that  the  variates  came  from  a  Gaussian 
distribution.  For  the  appropriate  hypothesis  test  for  such  cases,  the  total  of  cor¬ 
rect  decisions  was  obtained  and  the  corresponding  number  of  correct  decisions  was 
then  determined  for  an  equivalent  number  of  five-variate  Gaussian  samples.  The 
probability  for  correct  Gaussian  decisions  will  change  from  case  to  case  since  the 
form  of  the  test  is  dependent  on  the  two  distributions  considered. 

The  first  comparison  is  between  the  Gaussian  and  the  multivariate  exponential. 
The  Gaussian  distributions  were  correctly  identified  72.2  percent  of  the  time.  Al¬ 
ternatively,  the  exponential  sets  of  variates  were  correctly  chosen  in  43.  3  percent 
of  the  cases.  Next,  we  looked  at  decisions  between  Gaussian  and  other  non- 
independent  densities.  The  test  decided  correctly  for  the  Bessel  variates  at  a 
50  percent  rate  and  for  the  related  Gaussian  sets  at  76  percent.  For  gamma- 
related  variates,  56  percent  correct  decisions  were  made,  while  for  that  case 
the  alternative  Gaussian  decision  was  correct  64.7  percent  of  the  time.  Finally, 
we  considered  an  alternative  hypothesis  test  that  consisted  of  sets  of  Laplacian 
variates.  This  is  a  case  where  the  variates,  like  Gaussians,  are  independent. 

This  was  included  to  see  if  the  single -function  integration  required  for  an  indepen¬ 
dent  joint  distribution  would  affect  the  ratio  of  correct  decisions.  For  the  L.aplacc 
case,  the  test  was  correct  56.  5  percent  of  the  time  and  correspondingly  68.  5  per¬ 
cent  for  the  Gaussians. 
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5.2  Effect  of  Numerical  Integration 

As  was  discussed  in  Section  2  and  Section  3,  independent  variates  require  only 
a  single  function  to  be  integrated  and  for  many  cases  the  integration  does  not  have 
to  be  done  numerically.  In  contrast,  the  nonindependent  variates  require  succes¬ 
sive  integrations  of  different  functions  that  include  the  preceding  variates  of  the  set. 
Thus,  there  is  a  question  as  to  whether  these  repeated  numerical  integrations  could 
have  an  effect  on  the  values  assigned  to  the  variates  and  consequently,  on  the  out¬ 
come  of  the  decision  process  for  those  variates. 

To  address  the  question,  the  gamma -like  set  of  jointly  distributed  variates 
was  examined  again.  For  the  second  analysis,  the  forms  for  the  five  successive 
marginal  densities  of  the  gamma-like  function  can  be  analytically  integrated  to 
give  the  corresponding  forms  for  their  respective  cumulative  distribution  functions. 
These  forms  can  then  be  used  directly  to  relate  to  the  set  of  uniform  variates.  Thus, 
we  can  compare  the  results  of  the  hypothesis  test  for  the  same  dependent  multivar¬ 
iate  distribution  when  the  numerical  integrations  have  been  removed  from  the  pro¬ 
cess.  For  the  five-variate  gamma-like  case  without  numerical  integration,  the 
test  correctly  identified  the  sets  in  58.  5  percent  of  the  cases  and  for  the  related 
Gaussian  test  the  result  was  correct  66  percent  of  the  time. 

5.3  if  feet  of  an  Increase  in  Variates 

In  parameter  estimation  theory  we  can  obtain  a  better  estimate  for  the  para¬ 
meters  of  a  distribution  by  increasing  the  number  of  samples  of  the  population  in 
the  estimator.  Similarly,  in  typical  hypothesis  testing,  the  results  would  tend  to 
improve  as  we  increased  the  number  of  observations  that  appear  in  the  product  of 
the  likelihood  functions  that  determine  the  likelihood  ratio  for  the  hypothesis  test. 
However,  that  is  not  what  we  are  constructing  here.  In  the  present  formulation  we 
employ  only  a  single  observation  of  a  multivariate  PDF  as  the  complete  likelihood 
ratio.  For  this  to  be  a  similar  situation  to  the  preceding  ones,  we  would  have  to 
use  multiple  observations  of  the  joint  density  population.  Increasing  the  number 
of  variates  in  the  respective  joint  distributions  is  not  equivalent  to  increasing  the 
observations;  thus,  it  is  not  obvious  what  the  effect  such  an  increase  would  have 
on  the  reliability  of  the  hypothesis  testing  procedure.  This  aspect  has  been  studied 
for  the  two  cases  of  multivariate  exponential  and  Bessel  distributions. 

As  can  be  seen  in  Table  1  the  two  forms  result  in  different  decision  histories 
as  the  number  of  variates  increases.  The  first  part  of  the  table  shows  the  results 
for  each  multivariate  level  when  two  data  sets,  one  Gaussian  and  one  exponential, 
are  entered  into  the  Gaussian/exponential  hypothesis  tests.  The  second  part  of  the 
table  shows  the  corresponding  results  for  two  additional  sets  of  populations,  one 
Gaussian  and  the  other  Bessel,  in  the  Gaussian/Bessel  Hypothesis  tests.  In  the 
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evaluation,  the  large  number  of  Gaussian  sets  are  tested  against  the  decision  cri¬ 
terion.  Then  a  second  series  of  either  Bessel  variates  or  exponential  variates  is 
tested.  The  ratios  of  decisions  correctly  identifying  the  known  population,  com¬ 
pared  to  the  total  tries  for  that  particular  population,  determine  the  percentages 
shown  in  the  tables.  For  both  sets  of  hypothesis  tests,  the  percentage  of  correct 
Gaussian  decisions  increases  with  N  as  does  the  percentage  of  correct  Bessel 
distribution  decisions.  This  is  not  true  for  the  exponential  populations  in  the 
Gaussian/exponential  tests. 


Table  1.  Percentage  of  Correct  Decisions 
for  Four  Sets  of  RNG  Populations  in  Gauss¬ 
ian/Exponential  and  Gaussian/Bessel  Tests 
as  a  Function  of  the  Number  of  Variates 


Gaussian  and  Exponential  Sets 

N 

Correct  Gauss. 

Correct  Exp. 

5 

72.  1 

43.  5 

10 

74.  3 

43.  4 

25 

74.  5 

41.  5 

50 

75 

_  .  _ 

40.  5 

Gaussian  and  Bessel  Sets 

N 

Correct  Gauss. 

Correct  Bessel 

5 

75 

50 

10 

83.7 

54 

25 

86 

62.  5 

50 

93.  5 

70 

5.4  Analytic  Results  and  Comparisons 

The  corresponding  analytic  error-evaluation  for  the  test  of  Gaussian  or  expo¬ 
nential  hypotheses  has  been  discussed  in  Section  1  and  Section  4.  The  reason  for 
comparing  the  analytic  and  random  variate  generation  approaches  is  to  establish 
confidence  in  the  Monte  Carlo  approach  which  can  be  applied  more  generally,  espe¬ 
cially  in  cases  where  the  analysis  is  too  complicated.  Here  we  will  present  the 
results  for  both  types  of  error  with  each  hypothesis  and  compare  the  results  with 
the  related  Monte  Carlo  percentages. 

For  N  =  5,  the  type  I  error  associated  with  the  exponential  hypothesis  is  based 
on  P„  (2.  51  £  Q2  £  10.  987)  and  the  related  type  II  error  is  based  on  the  probability 
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5. 5  Summary  of  Results 

In  the  preceding  sections  a  number  of  interesting  results  have  been  enumerated. 
In  this  section  we  will  attempt  to  place  those  results  in  perspective. 

In  all  the  simple  alternative  eases  studied,  the  Gaussian  correct  percentages 
are  always  the  higher  of  the  two.  Also,  there  does  not  appear  to  be  any  significant 
distinction  between  independent  and  nonindependent  forms  as  the  alternative. 

For  the  case  where  numerical  integration  of  a  nonindependent  multivariate 
form  was  replaced  by  an  equivalent  analytic  result,  the  change  was  slight.  Thus, 
this  does  not  appear  to  be  a  problem  in  testing  the  error  probabilities  to  be  ex¬ 
pected  for  a  decision. 

The  decrease  in  correct  decisions  with  increased  variates  in  the  exponential 
case  contrasts  to  the  Gaussian  results  for  that  pair  of  alternatives  and  to  both  sets 
of  results  when  the  Bessel  form  is  the  alternative  to  the  Gaussian.  The  difference 
in  behavior  appears  in  both  the  random  variate  and  the  analytic  results. 

In  the  exponential  case,  a  further  test  was  made  for  N  =  100.  As  a  result  of 
the  prolonged  run-time  on  the  computer  this  case  was  halted  after  2  00  decisions. 

At  that  point,  the  ratio  of  correct  decisions  was:  Gaussian  7  9  percent  and  expo¬ 
nential  38.  5  percent.  This  is  slightly  off  the  predicted  trend  line  value  for  the  re¬ 
spective  cases  but  still  is  quite  close.  The  corresponding  analytic  results  for  that 
case  are:  Gaussian  75.  7  percent  and  exponential  40.  8  percent. 

For  all  the  levels  of  variates  considered  there  is  good  agreement  between  the 
analytic  and  random  number  generated  results  for  both  exponential  and  Bessel  al¬ 
ternatives.  This  would  tend  to  give  confidence  in  the  use  of  the  computer  technique 
to  evaluate  the  decision  process. 

In  all  these  results  the  main  factor  is  that  the  decision  process  and  the  asso¬ 
ciated  decision  regions  are  based  on  defining  those  portions  of  the  probability 
space  where  one  of  the  two  alternatives  has  a  greater  probability  of  occurring.  As 
the  test  now  stands,  there  is  no  assessment  in  the  decision  as  to  the  relative  mag¬ 
nitude  of  the  second  distribution  in  that  region.  The  high  probability  of  incorrect 
decisions  for  the  alternative  distributions  is  related  to  such  variates  having  a  high 
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probability  in  regions  where,  nevertheless,  the  Gaussian  PDF  dominates.  Since 
we  are  not  dealing  with  multiple  observations  of  these  multivariate  distributions, 
we  are  limited  in  our  ability  to  control  the  resultant  unsatisfactorily  large  error 
probabilities.  It  should  be  noted  that  for  the  Bessel  case  the  error  is  decreasing 
as  the  number  of  variates  increases,  which  is  a  desirable  result  in  terms  of  apply¬ 
ing  these  results  to  terrain  characterization. 


6.  TERRAIN  CHARACTERIZATION  APPLICATIONS 

The  purpose  of  the  statistical  studies  described  in  this  report  is  to  establish 
confidence  levels  and  limitations  for  the  hypothesis  testing  procedures  used  in  the 
terrain  characterization  studies  of  our  program  to  analyze  electromagnetic  scat¬ 
tering  from  rough  surfaces.  In  previous  sections  we  have  pointed  out  that  we  are 
using  sets  of  uncorrelated  variates  having  distributions  with  zero  mean  and  unit 
variance.  In  this  section  we  will  be  showing  the  relation  between  those  conditions 
and  the  more  general  ones  required  in  terrain  studies. 


6.1  N  on/.ero  Mean,  Correlated  V ariates 

In  the  terrain  analysis  hypothesis  testing  using  multivariate  density  functions, 

T  - 1 

the  decisions  are  based  on  the  values  of  the  quadratic  form  F  =  (z1  -ja)  (z1  -j u). 
This  involves  the  inverse  of  the  covariance  matrix  and  the  associated  nonzero  mean 
(h),  correlated  heights,  z'.  In  our  earlier  report1  we  discussed  the  transformations 
of  coordinates  used  to  generate  appropriate  PDFs  for  the  variates.  Similar  rela¬ 
tions  will  be  used  here  to  show  that  it  is  sufficient  to  derive  relationships  for  sets 
of  uncorrelated,  zero  mean  variates.  The  results  then  apply  to  the  general  multi¬ 
variate  case  as  well. 

First,  consider  the  general  multivariate  Gaussian  PDF.  It  has  a  form  that  is 
well  known.  7 

Let  z'  =  (z'j,  Zg,  .  .  . ,  zjj)  be  an  N-dimensional  random  variable  in  vector 
form.  Then,  the  multivariate  Gaussian  PDF  is  given  by 
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The  question  to  be  resolved  is  the  relation  between  the  very  general  form  re¬ 
presented  by  this  relation  and  the  more  simplified  forms  used  in  the  discussions 

T  - 1 

of  this  report.  Trivially,  for  £  =  z'  -  p,  F  =  ^  11  Next,  consider  the 

eigenvalues  of  the  covariance  matrix,  R  and  their  associated  eigenvectors;  let  I\ 
be  the  eigenvector  column  matrix  for  R  and  A  the  eigenvalue  vector.  Then,  con¬ 
sider  the  matrix 


The  vector  u,  defined  by  u  =  A  y,  is  a  new  vector  representation  consisting  of 
~  2  ~  ~ 
uncorrelated,  a  =  1  variates.  Then, 


T  T  T 

y  =  Au  and  y  =  u  A  . 

Also, 


T  - 1  T  ”  1  - 1 

R  =  A  A  and  R  1  =  (A  )  A  . 
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Then, 


„  „  T.,-1  T  .T  ..T.'1  .-1  .  T 

F  *  y  R  y  =  uA(A)  A  Au  =  uu. 


Thus,  for  our  terrain  hypothesis  testing,  the  value  of  the  quadratic  form  is  unaf¬ 
fected  by  the  degree  of  covariance  among  the  variates  and  the  results  obtained 
using  sequences  of  uncorrelated,  zero  mean,  unit  variance  variates  are  quite 
general. 

6.2  Uicussioo 

As  has  been  pointed  out,  the  hypothesis  test  is  based  on  minimizing  the  total 
possibility  of  error.  Further,  we  assigned  equal  probability  for  the  heights  to  be 
exponential  or  Gaussian  and  equal  costs  to  either  incorrect  decision.  Given  these 
factors,  we  found  that  the  test  was  more  likely  to  identify  correctly  a  sample  from 
a  Gaussian  distribution  than  from  an  exponential  one.  Alternatively,  we  could 
abandon  the  minimal  cost  and  modify  the  form  of  the  test  by  readjusting  the  deci¬ 
sion  regions  to  allow  equal  error  probabilities.  This  is  equivalent  to  a  minimal 
cost  condition,  where  we  have  biased  the  cost  of  an  incorrect  decision  for  one  dis¬ 
tribution  or  the  a  priori  probabilities.  For  instance,  we  could  use  terrain  fea¬ 
tures  to  assign  a  value  greater  than  one  half  to  the  probability  that  points  are  dis¬ 
tributed  in  a  given  form.  These  aspects  have  not  yet  been  incorporated  into  the 
decisions  of  the  terrain  characterization  program. 

The  preceding  conclusion  is  interesting  in  terms  of  the  decisions  generated 
by  the  terrain  data  base  in  Massachusetts  used  in  the  initial  electromagnetic  cal¬ 
culations.  In  the  original  report,  1  there  is  some  discussion  of  those  results. 

When  corrections  to  eliminate  some  of  the  errors  induced  by  inverting  a  large- 
covariance  matrix  are  included,  the  results  are  still  overwhelmingly  in  favor  of 
the  points  being  from  an  exponential  distribution.  This  is  disturbing  in  light  of  the 
expected  bias  towards  Gaussian  distributions  found  in  the  error  analysis.  Conse¬ 
quently,  there  are  several  additional  approaches  to  characterization  currently 
2 

being  pursued.  There  still  appear  to  be  numerical  problems  associated  with  the 
decision  process  that  will  have  to  be  resolved.  One  additional  aspect  of  interest  is 
the  effect  that  the  quantization  in  height  of  the  original  data  base  has  on  the  resul¬ 
tant  distributions;  this  also  is  under  investigation. 

In  this  report,  we  have  discussed  some  of  the  statistical  implications  of  the 
hypothesis  testing  approach  to  terrain  characterization  as  applied  to  our  electro¬ 
magnetic  scattering  analyses.  The  emphasis  has  been  on  understanding  con¬ 
straints,  confidence  levels,  and  errors  associated  with  the  decision  process  This 
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loads  to  the  development  of  random  number  generation  techniques  foi  multivariate 
nonindependent  distributions  and  to  fui  Liter  consideration  d  the  effects  of  numer¬ 
ical  accuracy  and  errors  on  the  decision  outcome. 
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Appendix  A 


A  Multivariate  Probability  Distribution 
Function:  Derivation  and  Analysis 


In  this  study  the  cumulative  distribution  functions  for  the  cases  of  independent 
variates  are  analytically  integrable  and  hence  the  relation  between  the  uniform  ran¬ 
dom  variates  and  the  new  set  is  readily  obtained.  For  the  nonindependent  cases 
of  exponential  or  Bessel  PDFs,  however,  this  is  not  so  and  the  value  of  the  new 
random  variate  has  to  be  determined  as  the  result  of  a  numerical  integration.  To 
examine  whether  this  introduces  any  effect  into  the  results,  particularly  those  of 
the  hypothesis  testing  procedures,  we  formed  a  new  type  of  nonindependent  PDF. 
This  form  is  analytically  integrable  for  the  case  N  =  5  and  hence  allows  a  compar¬ 
ison  between  results  when  the  same  set  of  five  functions  are  evaluated  both  analyt¬ 
ically  and  numerically. 

The  form  selected  for  this  purpose  is 

p(z  j,  ....  z^)  =  C^<|zjj  +  |Zg|  +  ...  +  |  z^  |  )  exp  [  -Cg  ( |  z  j  j  +  .  .  .  +  |  Zjj  | )]  . 

As  in  previous  work  we  use  the  zeroth  and  second  moment  integrals  to  determine 
the  normalization  constants  to  satisfy  the  requirements  for  this  to  be  a  probability 
density.  This  leads  to 

C1  =  c2N+1/(2NN)  and  c\  =  2(N+2)/N  . 
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In  addition  to  the  parameters  of  the  general  multivariate  density  function,  the  pro¬ 
cedures  also  require  the  form  for  the  general  marginal  density  of  arbitrary  order, 


/ 

iof  L  5  N. 


For  actual  use  we  employed  N  =  5  and  the  appropriate  set  of  functions  is: 
p(Zj, - z5)  =  C  j(|  Zj  )  +  .  .  .  +  |*5|)  exp  l  -C2(|zj]  +  .  .  .  +  |  z5 1 )] , 

. z4>  -  (2C1/c2][i  +  C2(lz1|  +  ...  +  |  z4  j ))  X 

X  exp  [  -C2<|  z1 1  +  .  .  .  +  |  z4| ))  , 


P(zl'  z2 '  z3(  =  <4ClCJ)t  2  +  c2(j  Zl  |  +  jz2|  +  |  z3j  ))  X 
Xexp  [  -C2(|Zj|  +  )  z2 )  +  |z3|)l  , 

p(Zj,z2)  =  (SCj/C^H  3  +  C2(|Zj  |  +  |  z2| )]  exp  [ -C.2(j  Zj|  +  |z2  | )]  , 


and 

p(Zj)  =  (16Cj/C2)  l  4  +  C2|Zj]  1  exp  l -C2|  Zj|  ]  , 
where 

Cx  =  {  143/(54  25)] 

and 

C\  =  14/5  . 
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Appendix  B 

Functions  of  a  Sequence  of  Random  Variates 


In  this  appendix  we  will  discuss  the  appropriate  forms  for  the  probability  den- 

N  2 

sity  function  for  the  statistic  q  =  £  z.  when  the  original  multivariate  density  is 

i=  1  1 

either  Gaussian  or  exponential  with  zero  mean  and  unit  variance. 

The  derivation  is  based  on  the  principle  that  if  there  is  a  collection  of  random 
variates  (zj}  and  a  function  gfzj,  z2> .  .  .  ,  z^)  then  generally,  \  =  g(Zj,  .  .  •  ,  z^)  is  a 
random  variate  and  we  are  concerned  with  determining  the  form  of  its  density 
function  and  that  of  additional  functions,  g^(x),  once  the  basic  form  is  established. 
Consider  the  hypershell  AD^  0f  the  ( z j,  .  .  .  ,  z^)  space  such  that 

x  <  Vzi  +  -"-  +4  < x  +  dx- 

V2  9  2 

Zj  +  .  .  .  +  zN  is  the  basic  function  of  the  random  variates  and  q  =  X  is 

the  secondary  function  of  interest  in  the  error  probabilities.  The  volume  of  the 

hypershell  is  given  by 


dV  = 


[aN  n  2  r(^|i)/r(N) 


X 


N-l 


dX  . 


1  C 

Papoulis  shows  how  the  two  results  p^(X)  and  p^(q)  are  determined  for  the 
Gaussian  multivariate  case, 
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p(z j.  • .  - ,  Zjj)  =  (2t)  exp  l  -  +  . . .  +  Zjvj)1  • 

2 

The  final  result  for  q  =  X  is 

q  >  0 
q  <  0 


N-2 

i-l  2 


Pq(q>  = 


[  2N/2  r(N/2)]  ~i  q  *  exp[-q/2] 


Similarly,  for  the  exponential  case  where 

f  N+l  IH 

>N/2[2  2  (2  ir )  2  r(^) 


-1 


p(z j,  ....  zN)  =  (N+l) 


X  exp 


_ 

-VnTT^'z21  +  ...  +zj  , 


we  obtain 


px(x)  = 


Then 


pq(q)  = 


(  (N+l)N/,2/r(N)]  x  N_1  exp  [VN-1  X]  x  SO 

0  X  <  0 


N-2 

l  (N+l)N/,2/(2T(N))]  q  2  exp  [  Vn+1v^]  4  SO 

0  q  <  0 
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